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1. Introduction 

We present and analyze a matrix- valued optimization problem formulated 
to sparsify matrices while preserving the matrix null-spaces and certain spe- 
cial structural properties. Algorithms for sparsification of matrices have been 
used in various fields to reduce computational and/or data storage costs. Ex- 
amples and applications are given in [21 [3l HI [5] and the references therein. In 
general, these techniques lead to small perturbations only in the higher end 
of the spectrum and don't respect the null-space or near null-space. This is 
intentional and not necessarily a limitation of these algorithms. 

We have different objectives for sparsification. First, we want to preserve 
input matrix structural properties related to matrix subspaces, if it has any. 
Being Hermitian or complex-symmetric are just a couple of examples of such 
properties. Second, we want sparsification to minimally perturb the singu- 
lar values and both left and right singular vectors in the lower end of the 
spectrum. Third, we want to exactly preserve the left and right null-spaces 
if there are any. 

Our main contribution in this paper is designing an algorithm that fulfills 
all the three objectives mentioned above. We also provide theoretical results 
and numerical experiments to support our claims. The algorithm presented 
here is inspired from a less general but a similar sparsification algorithm in- 
troduced in our earlier publication [6] . That algorithm, in turn, was inspired 
from the results presented in j7]. Understanding the previous versions of the 
algorithm is not a prerequisite for understanding the one presented here. 

There has been a related work where one solves an optimization problem 
to find a sparse preconditioner [S]. However, the function to be minimized 
there was non-quadratic and non-smooth, the method worked only for sym- 
metric positive-definite matrices, the sparsity patterns were supposed to be 
chosen in advanced, and the method was quite slow. Our algorithm does 
not produce the optimal sparsified matrix as measured by condition number. 
Nonetheless, it does not suffer from these limitations and produces precon- 
ditioning matrices that are competitive in terms of condition number. 

Our first objective, of maintaining structure, is important for aesthetic 
as well as practical reasons. If an algorithm produces an asymmetric matrix 
by sparsifying a symmetric matrix, it just doesn't look like an "impartial" 
algorithm. On the practical side, if a property like symmetry were lost, one 
would have to resort to data structures and algorithms for asymmetric ma- 
trices which, in general, will be slower than those for symmetric matrices. 



The second and third objectives - not disturbing the near null-spaces and 
the corresponding singular values while also preserving the null-spaces - are 
important when the inverse of the matrix is to be applied to a vector rather 
than the matrix itself. Then, the inverse, or the pseudo-inverse, of the spar- 
sified matrix can then be used in place of the inverse of the dense matrix 
without incurring a large error due to sparsification. This also means that 
the sparsified matrix can be used to compute a preconditioner with the like- 
lihood that computation with it will be less expensive due to the sparsity we 
introduce. 

To fulfill these objectives, we design and analyze a matrix-valued, linearly 
constrained convex quadratic optimization problem. It works for a general 
real or complex matrix, not necessarily square, and produces a sparse output 
matrix of the same size. A suitable pattern for sparsity is also computed. 
The user can control the sparsity level easily. We prove that the output 
sparse matrix automatically belongs to certain subspaces without imposing 
any constraints for them if the input matrix belongs to them. In particular, 
this property holds for Hermitian, complex-symmetric, Hamiltonian, circu- 
lant, centrosymmetric, and persymmetric matrices and also for each of the 
skew counterparts. 

We briefly mention that the rows and columns of the input matrix should 
be well-scaled for the best performance of our sparsity pattern computation 
method. A rescaling by pre- and post-multiplication by a diagonal matrix 
leads to a well-scaled input matrix [9]. We intend to highlight the effects of 
scaling to our algorithm in a future publication. 

In practice, we are concerned with matrices of size less than a few thou- 
sands. Applications of our method include computation of reusable sparse 
preconditioning matrices for reliable and efficient (in time consumed and 
memory used) techniques for solutions of high-order finite element systems, 
where the element stiffness matrices are the candidates for sparsification [6] . 
We want to stress that the sparsification process is somewhat expensive be- 
cause we need to solve a matrix-valued problem and we want to respect the 
null-space and the near null-space rather than the other end of the spectrum. 
However, we can reuse the computed sparse matrices for multiple elements. 
We have also modified the algorithm slightly to speed it up by orders of 
magnitude and this is described in the next paper in the series [1]. 

Intuitively, any sparsification algorithm with objectives like ours will be 
more useful when a large fraction of the input matrix entries has small but 
non-zero magnitude relative to the larger entries. Typical high-order finite 



element matrices have this feature due to approximate cancellation that hap- 
pens when high order polynomials are integrated in the bilinear form. On 
the other extreme, Hadamard matrices are poor candidates for sparsification 
like ours. 

Of course, special techniques in some situations can generate sparse stiff- 
ness matrix without any explicit sparsification (TU]. However, our objective 
is to be as general as possible and let the sparsification handle any kind of 
input matrix. 

The second paper yj describes version 1.0 of TxSSA, which is a library 
implemented in C-I--I- and C with interfaces in C++, C, and MATLAB. It is 
an implementation of our sparsification algorithm. TxSSA is an acronym for 
Tech-X Corporation Sparse Spectral Approximation. The code is distributed 
under the BSD 3-clause open-source license and is available here. 



http : //code . google . com/p/txssa/ 



Here is an outline of the paper. In Section [2| we describe the optimization 
problem, provide a rationale for the choices we make, and prove that it 
is well-posed. In Section [3} we describe the algorithm for computing the 
sparsity pattern and state some of its properties. In Section |4| we prove a 
few theoretical bounds that relate the output matrix to the input matrix and 
other input parameters. In Section [5| we prove that our algorithm preserves 
many important matrix subspaces. Finally, in Section [6} we show some basic 
numerical results. Detailed results and other practical concerns are given in 
the second paper of this series p. 

2. An optimization problem for sparsification 

We work with rectangular complex matrices and complex vectors. Sim- 
plifications to the real or square case are straightforward. 

2.1. Notation 

Let C"^^" denote the vector space of rectangular complex matrices with 
m rows and n columns. Let C" denote the vector space of complex vec- 
tors of size m. The superscripts 'T', '*', and 'f denote transpose, conjugate 
transpose, and the Moore-Penrose pseudoinverse respectively. A bar on top 
of a quantity implies complex conjugate. Let A/'(-) denote the null-space and 
7^(-) denote the range-space of a matrix. The default norm used for matrices 



is the Frobenius norm. For vectors the Euchdean norm is the default. Oth- 
erwise norms have specific subscripts. As usual, the identity matrix of size n 
is denoted by /„. The symbol I is used in case the size is easily deducible. 

Let A G C"^" be the input matrix to be sparsified. The matrix A 
can be expressed using the singular value decomposition (SVD) as UTy* . 
Let r := rank(y4). The singular values of A are {ctj}™^ sorted in non- 

increasing order, the right singular vectors are {vi}^^^, and the left singular 
vectors are {mj}^i- We divide U and V into two blocks each based on the 
rank r. We get 

A = UT.V* = [Ui U2] S [Vi V2Y 

where f/i and Vi each have r columns. Similarly, 



where Hr is of size r x r, diagonal, and invertible unless r = 0. The columns 
in Ui and Vi form orthonormal basis for TZ{A) and TZ{A*), respectively. Simi- 
larly, the columns in U2 and V2 form orthonormal basis for JV{A*) and //{A), 
respectively. 

Using the rank-nullity theorem, we get 

Pji := dim(A/'(v4)) = n — r 

where pn is the "right nullity" . Similarly, 

Pl '■= dim(A^(y4*)) = m — r 

where pi is the "left nullity". Thus, Af{A*) e C^^^^ and Af{A) e C"^*'^. 

Let K = k{A) := WAW^ II^^IL t>e a condition number for rank-deficient 
matrices. Obviously, n = ^4^ > 1 for non-zero matrices and it generalizes 
the usual condition number, which is typically used for square non-singular 
matrices. 

Let Z = Z{A) G M™^" be a matrix that denotes a sparsity pattern 
corresponding to A. It contains zeros and ones only. A zero in Z means 
the entry at the corresponding position in the output matrix is fixed to be 
zero. A one in Z means the entry at corresponding position will be allowed 
to vary and can be non-zero. The pattern matrix is always real even for 
complex input matrices. We do not create separate patterns for real and 
imaginary parts of a complex matrix. The rationale behind this choice will 



be explained later in Remark 3.10 in Section 3.4, where we will also provide 



a concrete algorithm for computing Z{A) . 



2.2. A misfit functional 

Let X = X{A) G C™^" be a matrix produced by sparsification of A. 
Fix Xij = if Zij = 0. When the sparsity pattern is fixed, X belongs to 
a subspace of C™^". Here, i and j are indices such that 1 < i < m and 
1 < j < n. 

We define an SVD based quadratic "misfit" functional J G M to specify 
a difference between input matrix A and an arbitrary matrix X. 

Definition 2.1. 

V V 

J=J{X;A):=lY.^^\\{X-A)v,\\l + lj2^2\\iX*-A*)u.\\l W 

1 = 1 ' 4 = 1 ' 

This misfit quantifies the action of the unknown matrix X on the singu- 
lar vectors of A and penalizes the differences in near null-space with larger 
"weights" . This is just a symbolic expression for defining J. We shall see in 



Section |2.4| that the SVD of A does not have to be computed to compute J 
and its derivatives. 

We now pose a linearly constrained quadratic optimization problem to 
compute X. 

min J{X] A) such that 

XiA) c^f{x), 

AfiA*) ^Af{X*), and 

X has a specified sparsity pattern Z{A) (2) 

It is readily seen that the constraints are linear homogeneous equality con- 
straints and J is a quadratic function in the entries of X. Additionally, the 
misfit J{X] A) is quadratic and bounded below by zero for any X whether 
the constraints are imposed or not. This is one way to see that it is also a 
convex function of X. We can rewrite the minimization problem as follows. 

mm J(X; A) such that XV2 = 0, X*U2 = 0, and Xij = if iZ{A)),j = 0. 

(3) 



X 



2.3. Rationale for the two-term misfit 

We had posed a similar minimization problem in [6]. We used it to con- 
struct sparse preconditioners for high-order finite element problems. How- 
ever, the input matrix A had to be real and symmetric. The output matrix 
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was specifically constrained to be symmetric. The misfit there had only one 
term, rather than two as shown here, because symmetry of A and X implied 
that the two terms were equal and hence one was redundant. For the same 
reason only the right null-space was imposed. The left null-space constraint, 
or equivalently, the constraint related to the conjugate-transpose matrix's 
null-space, became redundant. Generalization to asymmetric matrices is one 
of the contributions of the current research. 

We now work with an arbitrary input matrix (complex, rectangular) and 
thus make the setting completely general. This is done by working with the 
singular values and singular vectors rather than eigenvalues and eigenvectors. 
We have also avoided explicit imposition of structural constraints, like matrix 
symmetry. 

There are multiple reasons behind the choice of not using explicit struc- 
tural constraints. In many cases, it is indeed possible to impose linear equal- 
ity constraints to impose that X belong to a certain subspace of matrices. 
However, explicit specification would lead to speciahzed code for any new 
subspace. For example, special handling of Hermitian, complex-symmetric, 
Hamiltonian, and many other matrices would be needed. Although imposing 
such constraints individually is simple, explicitly imposing such constraints 
doesn't tell us what to do in case of matrices without common structures. 
Consider a general matrix A such that ||yl — A*|| f^ 0.001 ||A||. It is reason- 
able to expect and desire a similar small difference in X and X*. However, in 
general, we cannot impose an affine equality constraint to impose this. Thus, 
even a tiny perturbation in input that destroys some structure results in a 
completely different mathematical and computational problem (assuming we 
were actually explicitly imposing X = X* when A = A*). These considera- 
tions motivated us to design a new algorithm where no such constraints are 
needed. We now use a two-term misfit, two null-space related constraint sets, 
and abandon such explicit constraints even in cases where it is meaningful. 
We show in Section |5] that such an algorithm still preserves matrix structural 
properties whenever they are present. 

2.4. Avoiding the SVD 

We now show that the misfit in Equation ([I]) can be expressed in terms of 
the Moore-Penrose pseudoinverse of the input matrix A. This avoids using its 
singular values and singular vectors for actual computation. We expressed the 
misfit using SVD first to show that the misfit penalizes deviations in the lower 



end of the spectrum of A and to motivate the formulation. See Section 2.1 
for the notation. 

Lemma 2.2. Let A,Y e C™^". Let A = UJ:V* be its SVD. Then 

Y ^\\Yvi\\^ = \\YAn? , and 



.=1 ^i 






F* 1 1 2 
. Mi 



Y\A 



*^t ' 



Proof. Using the SVD based expression for the Moore- Penrose pseudoinverse 
and the fact that the Frobenius matrix norm is invariant under multiphcation 
by a unitary matrix, it is easily seen that 

\\YAm^ = \\yvj:^u*\\'^ = ||rys"^||^ 

The last term can be expanded to a summation using Vi, the columns of V, 
and — , the non-zero diagonal values of T,^. This results in the left hand side 
of the first equality. Thus the first equality holds. The second equality can 
be derived in the same fashion. D 

The previous result and the property that ||y|| = \\Y*\\ are used to 
express the misfit J from Equation ([I| as 

JiX;A) = l\\{X-A)A%+^-\\A^{X-A)\\l (4) 

This avoids the use of singular vectors and singular values and one avoid us- 
ing the SVD. The pseudoinverse or a reasonable approximation to it can be 
computed by any algorithm that is suitable depending on the conditioning 
and any known matrix properties. By way of example, if the rank-revealing 
QR algorithm reliably determines the numerical rank of A, it can be used to 
compute an accurate pseudoinverse. On the other hand if A were symmet- 
ric semi-definite too, the pivoted Cholesky will be faster for computing the 
pseudoinverse. 

This is a good place to mention that we are indeed making the assumption 
that the condition number of the input matrix is not "too large". This 
assumption is made so that one can avoid SVD and use a faster but less robust 
algorithm to compute the pseudoinverse. More importantly, if the condition 
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number of A were very large, its invariant spaces will be highly sensitive to 
any perturbation applied to A. Sparsifying might be worthless even if SVD 
is used. What is a "too large" condition number can can be quantified after 
some numerical experimentation and will also strongly depend on the relative 
number of zeros introduced via sparsification. 

2.5. First-order optimality conditions 

We derive the first-order optimality conditions for the minimization prob- 
lem using the method of Lagrange multipliers. Since the quadratic form 
in Equation ^ is convex for all A (see Section 2.2) and the set of feasible 



points is convex, any stationary point will be a local minima. Hence it is 
sufficient to consider solutions of the first-order optimality conditions. 

2.5.1. The Lagrange multipliers 

We introduce the Lagrange function C = C{X, Ar, Al, Az] A), where X is 
the unknown sparse matrix, A^ G C"^^^ and A^, G C^^^-^ are matrices of La- 
grange multipliers corresponding to the right and left null-space constraints, 
respectively. To impose the sparsity pattern, we use a matrix Az G C"*^", a 
matrix of Lagrange multipliers and zeros. If {Z{A))ij = 0, {Az)ij = fiij G C 
else {Az)ij = 0. 

CiX,An,AL,Az;A) ■.=^\\{X - A)A^\' + ^\\A\X - A)\\' 

+ trace(A^Xl^2) 
+ tTace{AlX*U2) 

ij where 

2.5.2. Derivative of the misfit 

Before differentiating C, we write the derivative of J with respect to X. 
We get 

^ = {X- A)A\A^)* + {A^)*A\X - A) 
oX 

= {XA^{A^y + (A^YA^X) - {AA^{A^)* + (A^yA^A) 

= {xA\A^y + [A^yA^x] - 2{A^y 



The simplification done above, in the terms not involving X, can be proved 
easily using the SVD based expression of the Moore-Penrose pseudoinverse. 
Using the relation between the "vectorization operation" and the Kro- 
necker product, we get 

vec (^\ = (At (At)* ® /^ + /„ ® (At)*At) vec(X) 

where /^ is the identity matrix of size k. We define the mn x mn Kronecker 
sum matrix A = A{A) as 

A = AiA) := A\A^y ®I^ + In® (At)Mt (5) 

Using the spectral properties of Kronecker sums, it is obvious that A{A) is 
always Hermitian positive semi-definite. It is Hermitian positive definite if 
and only if A is full-rank. We skip the proofs. 

2. 5. 3. Derivative of the Lagrangian 

We differentiate the Lagrangian given in Section 2.5.1 to derive the first 
order optimality conditions. Instead of presenting the intermediate steps in 
the derivation we write the final expressions only. Appendix A in [11] shows 
the intermediate steps for matrix-valued derivatives. 



dX 
dC 

dC 

dC 

dfiij 



=^ XAt(At)* + {A^YA^X + ArV; + U2AI + Az = 2(At)* 

^xy2 = o 

=^ X*U2 = 

=^ Xij = for ij such that iZ{A))^j = 



2.6. Existence and global uniqueness of the minimizer 

We show that the minimization problem posed in Equation ^ and with 
the linear system shown in Section 2.5.3| has a unique solution for any input 



matrix A and any imposed sparsity pattern. This is true even if the sparsity 
pattern has no relation to the input matrix A. 

Lemma 2.3. A minimizer always exists for the minimization problem posed 
in Equation M) for an arbitrary imposed sparsity pattern. 
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Proof. The equality constraints for null-spaces and sparsity are linear and 
homogeneous. Thus, the set of feasible solutions is non-empty. For example, 
the zero matrix is a feasible element. Since the quadratic form is convex for 



all X (see Section 2.2), a minimizer always exists [12j. D 



We now have to prove that there is a globally unique minimizer. We do 
this first for the full-rank A and then for a non-zero rank-deficient A. 

Lemma 2.4. // A is full-rank, the minimization problem posed in Equa- 
tion M) has a globally unique minimizer for an arbitrary imposed sparsity 
pattern. 



Proof. As mentioned in Section |2.5.2[ the quadratic form J is strictly convex 
on C™^'^ if A is full-rank. In particular, it is strictly convex on the subspace 
of those matrices that satisfy the equality constraints. Since the feasible set 
is convex and non-empty, the minimizer is globally unique. D 

Lemma 2.5. // A is non-zero and rank- deficient, the minimization prob- 
lem posed in Equation ^ has a globally unique minimizer for an arbitrary 
imposed sparsity pattern. 

Proof. For rank-deficient matrices the Hessian of the quadratic form is merely 
positive semi-definite on C"^^". But we show that it is positive definite 
on the subspace of matrices that satisfy the null-space related constraints. 
Showing this will ensure that the Hessian is positive definite on the subspace 
of matrices satisfying all the constraints (which includes sparsity constraints 
also). This, in turn, means that there is a globally unique minimizer even 
for the rank-deficient case. 

Using the fact that U and V are unitary, it can be easily shown that 
all matrices X that satisfy the null-space constraints can be written as a 
linear combination of r^ basis matrices using arbitrary complex coefficients 

r r 

i=l j=l 

As defined earlier, Ui and Vj are the left and right singular vectors of A 
respectively. 
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Consider the action of the Kronecker sum matrix A{A) (Equation ([5 



on the vectorized basis matrices UiV*. 



A{A)vec{uiV*) = {A\A^y (^I^ + h® {A^fA^) Yec{u,v*) 

{u,v*A\A^y + {A^YA^UiV*) 



vec 



Yec \ Ui^{A^y + {A^y -V 



, 1 * 1 * 

vec I Ui^jv, + Ui^jv, 

1 1 

^ H — ^ I vecfujf ' 
af aj I ^' 



Since both z, j < r, ctj > and aj > 0. Thus, vec(-Uif*) are eigenvectors of 
the Hessian matrix and the corresponding eigenvalues are positive. Hence, 
the Hessian restricted to the subspace of matrices that satisfy the null-space 
constraints is positive definite. D 

Theorem 2.6. A globally unique minimizer exists for the minimization prob- 
lem posed in Equation [^ for an arbitrary imposed sparsity pattern. 



Proof. This is a consequence of the three lemmas proved earlier — Lemmas 



2.3, 2.4, 2.5, and the easily seen fact that ii A = 0, then X = is the only 



feasible solution. D 

3. An Lp norm based algorithm to compute the sparsity pattern 

In this section, we describe an algorithm for determination of sparsity 
pattern in detail, analyze its computational complexity, state and prove some 
of its properties, and provide the rationale behind a few subtle technical 
issues. 

3.1. Overview of the sparsity pattern algorithm 



We showed in Section 2^ that the minimization problem to compute a 
sparse X has a unique solution for any imposed sparsity pattern. However, 
the sparsity determination phase is separate than the minimization phase 
and it is important to have a well-defined and universal method of choosing 
a sparsity pattern rather than imposing something arbitrary. 
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We had presented an Li norm based algorithm designed for real square 
matrices in [6] . Here we generalize for arbitrary matrices while also using the 
Lp norm. 

As before, we also provide a single parameter g, continuously variable in 
the interval [0,1], that allows a user to choose anywhere between the two 
extremes of very sparse (g = 0) and no extra sparsity (g = 1). Using a 
larger q will provide a better approximation but will be more dense, whereas 
a smaller q will provide a worse approximation but lead to fewer non-zeros 
inX. 

In any case, we eliminate only those entries that are small in magnitude 
relative to other larger entries. Intuitively, this makes sense because eliminat- 
ing small matrix entries will perturb the matrix and its spectrum relatively 
less. 

In choosing entries with large magnitude, our algorithm compares entries 
in the same row and column rather than with all other matrix entries. This 
choice helps in obtaining many important properties of the algorithm. The 



details and rationale are provided later in Section 3.5 



3.2. Definition of an Lp norm based sparsity for vectors 

Before describing the sparsity pattern algorithm for a matrix, we describe 
it for a single vector x G C™. This will be the key ingredient for the algorithm 
acting on a matrix. 



Definition 3.1. For any p G [0,oo], we define the Lp "norm" 
vector X. 



\x\\ for any 



\x\ 



E 



Jji 






Xi 



1=1 

number of non-zero x, 



1 <p < oo 
p = oo 

<p <1 

p = 



Note that ||a;|| is a "norm" only when p G [1, oo]. It is not a "norm" for 
p < 1. We simplify the notation and call it a norm always. 

We want to compute a sparsity pattern Z{x) for x G C™. Z{x) G M"^. 
It contains zeros and ones only. A zero in Z means the entry at the corre- 
sponding position is fixed to be zero otherwise it is free to be modified. 
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3.2.1. A combinatorial minimization problem, for vector sparsity pattern 

We state below a combinatorial minimization problem for computing 
Z{x) so that the number of eliminated entries is maximized while the Lp 
norm of the eliminated entries is bounded from above. It is a combinatorial 
optimization problem because we can place ones and zeros in Z{x) at arbi- 
trary locations. In addition, the number of non- zeros in Z{x) is unknown a 
priori. 

We specify the input parameters -p G [0, oo], g G [0, 1], and iV G [0, | |x| y. 
Here N refers to the minimum number of non-zeros to be preserved. It makes 
sense to have the upper limit for A^ not greater than ||x||g, which is the 
number of non- zeros present in x. The necessity of A^ will be discussed in 



Section 3.4.2 The symbol 'o' denotes entry-wise product of two entities. 



Given x,p,q, and A^, compute Z{x) using 
max 

Z{x) 



laauxllx — X o Z(x)\\q such that 



(Zix)), = or 1, 
\\Z{x)\\q>N, and 
\\x-xoZ{x)\\j^ < (l-g)||a;||p 

If X = 0, we define Z{x) = G M*". 

Before presenting an algorithm for computing Z{x), we remark on a few 
subtle cases. 

Remark 3.2. We allow q to take the extreme values, zero or one, in the 
definition, but in practical cases it will take an intermediate value usually in 

[0.5,1). 

The form of the q related inequality condition in our problem is moti- 
vated by two reasons — maintaining a role for g in p = oo case and better 
discrimination power for large p case. We explain this in the remarks below. 

Remark 3.3. One reason we define the q related inequality in terms of Lp 
norm of eliminated entries (||x — x o 2^(x)|| ) and not in terms of the norm 
of preserved entries (||x o Z(x)|| ) has to do with the large p case, especially 
p = CO. Assume p = oo and g < 1 and we impose ||xoZ(x)|| > g||x|| 
instead. In this case, preserving the entry (or entries) with the maximum 
magnitude would mean that the norm of preserved entries ||xo2(x)|| is 
greater than g | |x| | . Since this is true for any g < 1, that parameter becomes 
useless. This is avoided in Equation ^ by choosing a different inequality. 
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Remark 3.4. Another reason for using the norm of ehminated entries is when 
p is large but not necessarily infinite. In such a case, moving the smallest 
entry in the preserved part to eliminated part will affect the norm of elimi- 
nated part much more. Thus, comparing the norm of eliminated part with 
total norm has more "discrimination power" , specially for large p. For small 
p, close to one or even less than it, norms of both parts — eliminated and 
preserved — are affected roughly equally by such a move. 

3. 3. An algorithm to compute sparsity for vectors 

We now present an algorithm to compute the sparsity pattern Z{x) for 
the problem posed in Equation ([6]). See Algorithm [l| 

Remark 3.5. If there are multiple candidate entries in x with the same mag- 
nitude that can lead to a non-unique 2{x), then any one of them can be 
used. This is a "corner case" and a simple rule like using all the equal values, 
even if fewer than all are sufficient, will break the tie and give a deterministic 
algorithm. In practical problems, we don't expect that the vector will have 
a huge number of equal and large entries of exactly equal magnitude. 

Theorem 3.6. Algorithm^ solves the optimization problem posed in Equa- 
tion 1^ to find the sparsity pattern for a given vector. 



Proof. We start with the three constraints in Equation (|6|. Obviously, the 
output Z{x) is such that {Z{x))i = or 1. The second "while" loop ensures 
that ||Z(a:)||Q > A^. The first while loop makes sure that ||x — x o Z(a;)|| < 
(1 — q) \\x\\p and the second one can only decrease ||x — a; o Z(x)|| . 

Hence all the conditions are maintained and the issue is whether the 
output pattern maximizes the number of eliminated entries. It is clear from 
the algorithm that if a particular entry is eliminated, all entries smaller than 
that must have been eliminated. Assume that we want to eliminate one more 
(non-zero) entry. Doing this will violate the either the second or the third 
constraint in Equation ^ depending on which of the two was the active one 
in limiting the number of ones in Z{x). D 

3.3.1. Computational complexity for sparsity pattern of vectors 

For a vector of size m. Algorithm [I] runs in 0{m \og{m)) operations. This 
assumes that an 0{m\og{m)) algorithm is used for sorting. Hence the overall 
complexity is 0(m log(r7i)). 
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Data: x eC"',pe [O,oo],qe [0,1], and N e [0, ||x||o] 
Result: Sparsity pattern Z{x) E M™ with (Z{x))i = or 1 

{Z{x))i ^ for i = 1, . . . , m; 
j -^ num-zeros{x); 
if j = m then 
I return; 
end 

X <— abs(x); 
ids •(— [1, 2, . . . ,m]; 
Sort (x, ids) pairs in an ascending order with Xi values as the keys; 

// Eliminate entries until a threshold is not crossed 

while J < m — A^ do 

if ||a;(l : j + l)||p > (1 - g) \\x\\^ then 

I break; 
else 

id i— ids{j); 
{Z{x))u ^ 1; 

J ^ J + 1; 

end 
end 

// Preserve the remaining entries 

while j < m do 

id ^ ids{j); 
{Z{x))id ^ 1; 
J ^J + 1; 
end 

Algorithm 1: An algorithm to compute sparsity pattern of a given real 
or complex vector. See Equation (|6|) for details. 
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3.4- Definition of an Lp norm based sparsity for matrices 

We use the sparsity pattern computation algorithm for vectors described 



in Section 3.3 to compute a sparsity pattern for matrices. The algorithm for 
matrices can be divided into three separate steps. The first two steps can 
be executed independently and thus in any order. Assume we have a matrix 
A G C'"^", and parameters p, q. We also need the parameters Nrow and 
Ncoi, which are used to specify minimum number of non- zeros in each row 



and each column, respectively. In Section 3.4.2 we will see how N^ow and 
Ncoi are determined and specified a natural way. Here are the three steps for 
computing Z{A). 

1. Compute the pattern in each of the m separate rows (treated as vec- 
tors). 

2. Compute the pattern in each of the n separate columns (treated as 
vectors). 

3. The final pattern Z{A) is the union, a boolean OR, of the row-based 
pattern and the transpose of column-based pattern. 

The algorithm presented above is a specific choice that satisfies many useful 
theoretical properties. See Section [33 



Remark 3.7. If it is known that the entry-wise absolute value matrix is sym- 
metric, then we compute either the row or column pattern and take the 
union with its transpose. This shortcut is not necessary but is faster. More- 
over, whether one takes the shortcut or not, the result is that the pattern 
is symmetric for matrices that are Hermitian, skew-Hermitian, or complex- 
symmetric. 

Remark 3.8. In general, the number of non-zeros will vary in each row and 
column depending on the distribution of the magnitudes of the entries. In 
this sense, this is an adaptive method and one does not know the number of 
non-zeros a priori for a given set of parameters. 

Remark 3.9. The algorithm works naturally with rectangular and complex 
matrices. 

Remark 3.10. We compute a single pattern matrix for complex matrices 
rather than separate independent patterns matrices for real and imaginary 
parts for two distinct reasons — one theoretical and one practical. Firstly, 



17 



when a single pattern is computed, the pattern remains invariant if the input 
matrix is muhiphed by a non-zero complex number (see Property |2] in Sec- 
tion 3.5). This property will not hold in general when two patterns are 
computed. Secondly, storing two patterns for the sparsified X, for the real 
and imaginary parts, increases the storage costs and one cannot use complex 



arithmetic for first-order optimality conditions (see Section 2.5.3) 



3. 4-1- Computational complexity for sparsity pattern of matrices 

We discuss the computational complexity for each of the three steps in 



Section 3.4 using the computational complexity for sparsity pattern of vectors 



in Section [3.3. 11 

1. Computing the sparsity pattern for m rows requires m{nlog{n)) oper- 
ations in total. 

2. Computing the sparsity pattern for n columns requires n{m\og{m)) 
operations in total. 

3. Computing the union operation is relatively more complex. Creating 
the transpose of a sparse matrix takes 0{mn) operations. After trans- 
posing, the union is computed row by row. Computing union of the 
patterns of two row vectors of size n each takes 0{n log(n)) operations. 
This is done m times. 

Hence, the overall cost is still 0{mn{log{m) +log{n))) which can be expressed 
as 0{mnlog{mn)). 

3.4-2. Interaction of sparsity and null-space 

We mentioned earlier that we can specify the parameters Nrow and Ncoi 
when computing the sparsity pattern of a matrix. The two values are used 
to specify minimum number of non-zeros in each row and each column, re- 
spectively. Here we show why one needs these parameters. 

Consider a matrix X G C™^" with a given sparsity pattern Z{A). When 
solving the optimization problem posed in Section 2.2[ the unknown entries 



in each row of X have to satisfy pr linear homogeneous equality constraints 



(see Section 2.1 ). If the number of allowed non-zero entries in a row of Z{A) 
is less than or equal to pr = dira{jV{A)), then all the entries must be zero, 
and thus the whole row is zero. This is an undesirable situation. The same 
logic applies to columns and left null-space. Thus, an algorithm that decides 



the sparsity pattern should also keep sufficient number of non-zeros in each 
row and each column so that such degenerate matrix is not produced. In 
practice, we choose Nrow = min(n,pR + 1) and Ncoi = min(m,pL + 1). 

This restriction implies that the null-space dimension must be known 
before computing the sparsity pattern. Another implication is that sparsi- 
fication while maintaining null-space constraints cannot be very useful if a 
matrix is highly rank- deficient. In many applications the rank-deficiency is 
a small constant independent of matrix size and this is not a huge concern. 

3.5. Properties of the Lp norm based matrix sparsity patterns 

We enumerate a few important properties of the sparsity patterns gen- 



erated by the Lp norm based algorithm of Section 3.4 Let A G C™^", and 
Z{A) G M™^" denote its sparsity pattern matrix. The number of non-zero 
entries in Z{A) is expressed as |Z(yl)|. When we want to discuss a specific 
parameters p and q, we write Z{A;p, q) instead. 

We also need the notion of complex permutation matrices [T3| Section 
IV. 1]. 

Definition 3.11. A complex permutation matrix is a matrix such that it has 
a only one non-zero entry in each row and each column and every non-zero 
entry is a complex number of modulus 1. 

Complex permutation matrices are always square and unitary. We use 
the letters P and Q to denote them. 

It can be shown that Z{A) satisfies the following properties. The proofs 
are elementary and we skip them. 

P-1 Ay = =^ {Z{A))i, = 0. 

P-2 Z{aA) = Z{A) for a G C \ {0}. 

P-3 ZiA^) = iZ{A)f. 

P-4 Z{A*) = iZ{A)f. 

P-5 Z{A) does not depend on the signs of entries of A. 

P-6 Z{PAQ) = \P\Z{A) \Q\, where P,Q are any size-compatible complex 
permutation matrices and |-| denotes entry- wise modulus. 

P-7 gi < q2 =^ Z{A;p,qi) < Z{A;p,q2) entry-wise, where gi and g2 are 
any two sparsity parameters. 

19 



4. A priori and a posteriori bounds related to the misfit 

Our goal in this section is to prove three theoretical bounds related to 
the sparsification algorithm. 

Definition 4.1. For a given A G C™^" and the Lp norm sparsity threshold 
parameter q, 

A'"^ ■.= Z{A;p,q)oA. 

Here 'o' denotes entry-wise multiplication. Thus, A^^'^ G C™^" is the 
matrix which is obtained by setting those entries of A to zero that correspond 
to zero values in the pattern Z{A;p, q). 

For proving the bounds below, we restrict p to be in [1, cxo] so that the 
standard Lp norm related inequalities are applicable. For A G C"^", we will 
show that 

||A-AP«||2< {mn)^{l-q)\\A\\^ 

where 

pip 

A few numerical experiments show that this upper bound is not too loose 
when p is close to 1. However, it is quite pessimistic as p gets larger than 2. 
Our purpose is here is solely to show that by sparsifying each row and each 
column individually we can bound the perturbation error for the full matrix. 
For a square non-singular matrix A and for X that satisfies the sparsity 
constraints, we will show that 

min J(X; A) < m^+^^(l - qfK{Af. 

For an arbitrary A and the corresponding misfit-minimizing X, we show 
that all the non-zero eigenvalues of XA"^ and A'^X are within a circle of radius 
\/2Jmin centered at (1, 0) in the complex plane. Here Jmin is the minimum 
misfit value. Provided Jmin < ^ and that the left and right nullities of X are 
not greater than those oi A, we show that 



(7) 



max{K{XA^),K{A^X)) < 



1 -|- \/1j„ 



1 — y2Jrr 
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4.I. An upper bound for the perturbation 

Theorem 4.2. For A G C"^", p G [1, oo], q G [0, 1], A^'^ defined in Defini- 
tion 4-i. o-nd C expressed in Equation ^, 



p-AP«||2< imn)'^{l-q)\\A\\^. 

Proof. We first define the usual dual norm parameter p'. 
Definition 4.3. The number p' is the dual oi p and is defined by 

- + - = 1 ^ P = -. 

p p' p — 1 

In case of p = 1 and p = oo, the appropriate limiting values are used and 
p' = oo and p' = 1, respectively. 

The way Z{A]p,q) is computed, by working with each row and each 



column (see Section 3.4), it is obvious that the following inequalities hold. 



We use the MATLAB notation. 

\\A{z,:)-A^'^{z,:)\l<{l-q)\\A{z,:)\l^ 
m:,j)-A^\:,j)\l<il-q)\\A{:,j)\\^ 

Taking maximums on each side, we get the following two inequalities. 

max||A(^,:)-A^^(z,:)||^<(l-g)max||A(^,:)||^ (8) 

max||A(:,j)-^"'^0,j)llp<(l-9)max||A(:,j)||, (9) 

We use the following standard results valid for any matrix Y G C"^" [I 
Section 6.3]. 



UP 



'--1 



r||^<max||F(:,j)||,<||l^||p 



J 



mp ^ \\Y\\,^, < max||y(i,:)||p < \\Y\\p, . 

Using the substitutions Y ^ A — A^'^ and Y ^ A separately and combining 
the results with Equations ^ and ^ we get, 

J-' \\A - A^l^ < {1 - q) \\A\l^ 
mp-'\\A-AP'^\\^,<{l-q)\\A\\^,. 
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We now make use of the following standard result for pi,p2 G [1, oo] and 
9 G [0, 1] [m Section 6.3]. Let pu be such that 

1 i~e e 



Pl2 P2 Pi 

Then, 

l|y|| < iirii^ IIF||^"^ 

Choosing pi = p, P2 = p', and 9 = ^ implies pi2 = 2. This gives 

We now need a bound for 1 1^| L 1 1^| L/ in terms of | |v4| I2. Using the results in 
[m Section 6.3] again, it can be shown that 

\\A\\^\\A\\^,<{mnp-'^\\\A\\l. 

Using this inequality and taking square roots, we get the result we started 
to prove. 

\\A-AP%< {mn)^ (1 - q) \\A\\^ 

n 

4-2. An upper bound for the misfit functional 

Theorem 4.4. For a square non-singular matrix A and for X that satisfies 
the sparsity constraints, 

min J(X; A) < m^+^c^^ _ qf^iAf. 

where C is defined in Equation 1^. 
Proof. For n = m, Theorem 4. 2 [ gives 



Let X be the sparse matrix minimizing the misfit functional J{X;A). The 
only constraints on X are due to sparsity. The matrix A^'^ satisfies the 
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sparsity constraint as well. We want to find an upper bound on J(X; A) in 
terms of p, q, and A given that \\A — A^'^||2 is bounded. 

JiX;A)<JiAP'i;A) = ^\\(A^'i - A)A-% + ^\\A-\AP'i - A)\\l 



< "^ (^WiA^" - A)A-% + \\A-\AP'^ - A)\\'^ 

< '^(^\\AP''-A\\l\\A-%+\\A-%\\AP'^-A\\l 



mWAP^-AWlWA-^ll 

II I I ■^ I I 112 



< m'+^^'il-qf \\A\\l\\A-' 
= m'^^'='{l-qfKiAf 



Taking minimum with respect to X proves the result. D 

This analysis is useful in showing that apart from the sparsity parameter 
q, the condition number of A will likely play an important role too. That is 
to say that if A is not very well conditioned, one would require a denser X, 
by using a larger q, to obtain a smaller misfit between X and A. 

Remark 4.5. Note that we first bound the Frobenius norm differences in terms 
of spectral norm differences and then use the submultiplicative norm property 
instead of doing it the other order. This is because doing it the other way 
would have introduced another power of m and led to a less tighter bound. 

Remark 4.6. The bound proved above is pessimistic for two reasons. First, it 
does use the fact that X is the minimizer but it does not quantify the effect 
of minimization. Second, it makes the worst case assumptions in using the 
matrix norm equivalence relations. 

4-3. A posteriori bounds related to clustering and conditioning 

We now relate the minimum value of the misfit functional J to generalized 
condition numbers and eigenvalues of XA^ G C™^'" and A'^X G C"^", where 
X is the misfit minimizing matrix. Note that X also satisfies the null-space 
related constraints. These bounds do not depend on any specific chosen 
sparsity pattern. They are pessimistic bounds and are useful for qualitative 
understanding. 

Define Jmm to be minimum value of misfit for any fixed chosen sparsity 
pattern. 
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Theorem 4.7. All the non-zero eigenvalues of XA^ and A^X are within a 
circle of radius ^1 J rain centered at (1,0) in the complex plane. 



Proof. As shown in Section 2.1, the matrix A can be factorized as UiT^rVi. 
Since X satisfies the nuU-space related constraints, it can be expressed as 
UiYV{', where Y e C^"". We can simphfy J^m- 

T ■ - -iiyr "^ - 7'iP +-\\y -^Y - /'IP 

^mm oil *" I If ^^ 9 I \ T \\F 



Thus, ||ySr^^— /||i? and ||Sj.^^l^— /||f are both less than or equal to \/2Jmin- 
We use the fact that the magnitude of each eigenvalue of a matrix is less than 
the Frobenius norm. Thus, all eigenvalues of yS^"^ — / and Tjr~^Y — I have 
a magnitude less than a/2 J^m- This means all eigenvalues of FS^^^ and 
S,f.~^y are within a circle of radius \/ 2 J rain around (1,0). 

It is trivial to show that ignoring any zero eigenvalues, the eigenvalues of 
XA^ and A^X are same as the eigenvalues of yS^"^ and Sr~^y, respectively. 
This completes the proof. D 

The second result relates generalized condition numbers of XA^ and A^X 
with the value of Jmin- 



Theorem 4.8. Let X he the minimizing matrix that satisfies all the con- 
straints in Equation M) and Jmin be the minimum value. Provided J, 
and that the left and right nullities of X are not greater than those of A, 



< 1 

mm ^ 2 



max{K{XA^), k{A^X)) < ^ + ^'^ 



■ V2Jn 

Proof. We show that 



i y ^Jmin 

The proof for k(^A^X^ is similar and combining the two will bound their 
maximum and prove the theorem. 



Using the notation and steps in the proof of Theorem 4.7, we get 



K 



(XAt) 



^ininl^-'^ ^r y 



Since the left and right nullities of X are not greater than those of A, Y and 
hence YT.r~^ are full-rank matrices. This means the denominator above is 



non-zero. 
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We bound the numerator from above. 
(JmaxCi^Sr""^) = max- 



\x\ 



max 



< max 



|(/-(/-rS,-^))a;| 



\x\ 



\x\\+ (j-rsr )2; 



\x\ 



= i + ||/-rEr^| 

< l+\\l-Y^r \ 



< l + ^/2I„ 



We bound the denominator from below assuming ^2 J mm < 1- 

1 ZjT> X 



'^minl^-' ^ 



-1\ 



mm 



mm 

x^O 



X 



> min 

x^O 



mm 

x^O 



|(/-(/-FS,-^))a;|| 
||x||- ||(/-rs,-^)a;| 



X 



|(/-rsr^)x| 



X 



The following chain of inequalities hold. 



(10) 



\x\ 



<\\I-YK'%< ||/-FS,-'|L< v^2J~<l 



This means the Equation (|10|) can be simplified as follows. 



mm 

x^O 



1 



(j-rsr )2; 



1 — max 

1- \\I -YK 



\x\ 



-11 



> i~\\i -yj:-^ 



""^ J- \/ ^^m.in.' 
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Hence 



and we are done. D 

5. Subspace preserving sparsification 

We now show that the output sparse matrix automatically belongs to cer- 
tain subspaces, without imposing special constraints for them, if the input 
matrix belongs to them. In particular, this property holds for Hermitian, 
complex-symmetric, Hamiltonian, circulant, centrosymmetric, and persym- 
metric matrices and also for each of the skew counterparts. Obviously, prov- 
ing this will require that all the components of the algorithm - the sparsity 
constraints, the null-space related constraints, the form of the misfit func- 
tional - have features that make the preservation of subspaces possible. We 
first prove the results for general permutation transformations and then apply 



the results to specific complex permutation matrices in Section 5.2 



5.1. Invariance of the general minimization problem 

We prove a few intermediate results. Here is the required notation. Let 
a G C, |a| = 1 be a constant, P,Q & (^mxm ^^ (^nxn ^^ complex permu- 
tation matrices, and Y G C™^". Let Y"^ = Y or Y* or Y'^ , where op is 
a placeholder and means operation. Then (Y°pyP = Y, \\Y°P\\p = \\Y\\p, 
{aPA°PQ)^ = {l/a)Q*{A°P)^P* = aQ*{A"P)^P*, and {A°p)^ = (At)°^. If op 
denotes transpose or conjugate transpose (YiY2y^ = Y^^Yx^ . The proofs of 
each of these equalities is trivial. 

Definition 5.1. Let /(V) = aPY^^Q for some given a, P, and Q where 
|a| = 1 and P and Q are appropriately sized complex permutation matrices. 

Lemma 5.2. J{X\A) = J(f(X);f(A)), where J{X;A) is the misfit func- 
tional defined in Equation 01). 



Proof. Define 



MX;A) = ^\\iX - A)A^\\' 
Jl{X;A) = ^-\\A\X-A)\\' 
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so that J(X; A) = Jr{X; A) + Jl{,X; A). The subscripts L and R stand for 
left and right, respectively, and refer to the side on which the pseudoinverse 
is applied. Then 



jR{f{X)-f{A)) = I {aPX°^Q-aPA"PQ){aPA°^Q) 



2 
1 

2 
-||(X°P-A°P)(At)°^||' 



aP{X"P - A''P)QaQ*{A°PyP* 



If op denotes transpose or conjugate transpose, the equality above shows that 

]^\\{X"^ - A"P){AT\\^ = ]^\\A\X - A)\\^ = Jl{X-A). 
If op denotes no change (that is Y°^ = Y), then 

^\\{X"^-A''n{A^r\\'=^-\\{X-A)A^\\' = MX;A). 

A similar result can be proved by expanding JL{f{X); f{A)). For each sub- 
stitution for op, J«(X; A) + MX; A) = Jr(/(X); /(A)) + MfiX); /(A)). 
Hence proved. D 

We now relate the transformation of null-spaces when matrices X and A 
are transformed by the mapping /. 

Lemma 5.3. Let X e C"^" such that XAf{A) = and X*Af{A*) = both 
hold. Then f{X)Af{f{A)) = and {f{X))*Af({f{A))*) = 0. 

Proof. The following implications are easy to see. 

vr G Af{A) =^ Avr = and Xvr = 

vl e ^f{A*) =^ A*vl = and X*vl = 

WReM{f{A)) =^ aPA"PQwR = 

WLe^iifiA))*) =^ aQ*{A''PrP*WL = 

We consider the three op cases separately. Consider the first case when op 
denotes no change (that is Y°^ = Y). Then it is evident that f/{f{A)) = 
Q*AfiA) and Af{{f{A))*) = PAf{A*). Thus, 

f{X)Af{f{A)) = aPXQQ*Af{A) = aP{XAf{A)) = 
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and 

(/(X))W((/(A))*) = aQ*X*P*PM{A*) = aQ*{X*Af{A*)) = 0. 

This proves the resuh for the first case. 

In the second case, where op denotes conjugate-transpose, a similar ar- 
gument shows that AA(/(A)) = Q*Af{A*) and Af{{f{A))*) = PAf{A). Thus, 

/(X)A/'(/(A)) = aPX*QQ*Ar{A*) = aP{X*Ar{A*)) = 

and 

(/(X))*A/'((/(A))*) = aQ*XP*PAf{A) = aQ*{XAf{A)) = 0. 

This proves the resuh for the second case. 

In the third case, where op denotes transpose, we can follow similar steps 
asn show that A/'(/(A)) = Q*J\f{A*) and Afl{f{A))*) = PAf{A). Thus, 

/(X)A/'(/(A)) = aPX^QQ*Af{A*) = aPX*Af{A*) = 

and 

(/(X))*A/'((/(A))*) = aQ*XP*Pjf(A) = aQ*XM{A) = 0. 

This proves the result for the third case and we are done. D 

We now relate how the mapping / interacts with the operation Z that 



computes the sparsity pattern (Section 3.4). 



Lemma 5.4. Let A E Qmxn ^^^ f{A) = aPA°PQ as used earlier, then 
Z{f{A)) = \P\Z{A) \Q\. Here \-\ denotes entry-wise modulus. 

Proof. The proof is immediate by using the various properties of Z enumer- 
ated in Section 13.51 D 



We show how the solution of the full sparsification problem changes when 
the input matrix changes. 

Theorem 5.5. If X solves Equation M) for a given input A, then f{X) 



solves it for the input f{A), where f is given in Definition 5.1. 



Proof. The lemmas 5.2, 5.3[ and |5.4 show that each of the three compo- 



nents of the minimization problem - the sparsity constraints, the null-space 
related constraints, the form of the misfit functional - are invariant under 
transformation by /. This proves the statement. D 
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5.2. Invariance for specific matrix subspaces 

We now define a few specific complex permutation matrices (see Defini- 
tion 3.11). This allows us to succinctly characterize certain special matrix 



subspaces. 

Definition 5.6. For each m > 0, J^ is a matrix in 
anti-diagonal and zeros elsewhere. 



with ones on the 



For example, 



J:^ 



'0 





1" 





1 





1 









Definition 5.7. For each even m > 0, Km is the following skew-symmetric 
matrix in ]R™><™. 

II 



K„ 



Definition 5.8. For each m > 0, 



^„ 



Om-l 
1 



Im-l 
0^_ 



-J, 



and C„ 







Om-l I', 
-1 



m—1 



m—1 



where Om-i is the zero vector in 



D(m— l)xl 



The matrices C^ and C^ are used for cyclic permutations. Table 5.2 



shows how certain matrix subspaces can be characterized using these permu- 
tation matrices. The conditions satisfied by the sparsity pattern computed 
by Algorithm [T] can be easily proved using the properties mentioned in Sec- 
tion [3]5l 



Theorem 5.9. // the input matrix A for the problem in Equation ((Sy belongs 
to one of the following matrix subspaces - Hermitian, complex- symmetric, 
Hamiltonian, circulant, centro symmetric, persymmetric, or one of the skew 
counterparts - then the output matrix X belongs to the same subspace. 

Proof. If the input A satisfies any of the stated properties, then A = f{A) 



for a specific choice of / (see Definition 5.1 ) for one of the categories shown in 
Table 5.2 If X is the solution of Equation pi) corresponding to A, then using 
Theorem 5.5, f{X) is the solution corresponding to f{A). Since A = f{A), 



and as proved in Theorem 2.6, Equation ^ has a unique solution, it implies 
X = f{X). Thus X preserves the relevant property satisfied by A. D 
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Matrix subspace 


Condition on A 


Z = Z{A) satisfies 


centrosymmetric 


AJ-JA = 


ZJ-JZ = 


skew-centrosymmetric 


AJ+JA = 


ZJ-JZ = 


circulant 


AC+ - C+A = 


ZC+ - C+Z = 


skew-circulant 


AC- - C-A = 


ZC+ - C+Z = 


complex-symmetric 


A-A^ = 


Z-Z'^ = 


skew-complex-symmetric 


A + A^ = 


Z-Z^ = 


Hamiltonian 


KA + A*K = 


K Z - Z^ K\ = 


skew-Hamiltonian 


KA - A*K = 


K\Z -Z^ K\ = 


Hermitian 


A-A*=0 


Z-Z"^ = 


skew-Hermitian 


A + A* = 


Z-Z^ = 


persymmetric 


AJ -JA* = 


ZJ - JZ^ = 


skew-persymmetric 


AJ +JA* = 


ZJ - JZ^ = 


symmetric (real) 


A-A"^ = 


Z-Z^ = 


skew-symmetric (real) 


A + A^ = 


Z-Z^ = 



Table 1: A sumniary of conditions on a matrix A so that it belongs to a particular subspace 
of C™^™. Also shown are the conditions satisfied by the sparsity pattern computed by 
Algorithm [I] Definitions 5.6 5.7 and 5.8 give the expressions for the matrices J,K,C'^, 
and C^ . 
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6. Numerical results 

We now present some sparsification results for a fixed real asymmetric 
matrix AeR^°'"^'^, where 

A, = cos(3n5jy, (11) 

and the indices start at 1. This is a good candidate matrix since its con- 
dition number is approximately 621, which is similar to numbers found in 
spectral finite element matrices, we don't have to scale its rows and columns, 
it is deterministic, and it has some values that are near zero in magnitude. 
See Figure flta) for a 2-D plot of the matrix generated using the espy pro- 
gram [T5]. 

We first sparsify it using the following parameters: p = 1 and q = 0.8. 
Figure [l](b) shows the sparsity pattern and values of the output matrix X. 
We get the following data for this set of parameters - cond(X) = 552, 
cond(y4'''X) = 4.73, cond(Xy4"'') = 5.37, and X contains 597 non-zeros, which 
is approximately 37% density. Thus, nearly two-thirds of the entries are re- 
moved due to sparsification and the condition number is not far from 1, the 
minimal value. 

Figure |2] shows the singular values of A and X. Clearly, the lower end 
of the spectrum is perturbed less than the higher end. This shows that the 
algorithm is working as intended. 

It is desirable to compute three quantities for measuring sparsification 
performance. First is the condition number of A^X or XA\ second is the 
Frobenius norm difference of inverses, I IX''' — A''' 1 1 / 1 1 A''' 1 1 , and third is the 
number of non- zeros in X. The exact multiplication order in A^X or XA"^ is 
not particularly important. This is because almost always the two quantities 
are close to each other, as we have observed. Figure |3Fa) shows how the 
condition number varies when we vary q and p. Similarly, Figure [3Fb) shows 
how the number of non-zeros change on varying the parameters. Note that 
for a fixed q, increasing p leads to an increase in the number of non-zeros. 
The variation in condition number is reasonably smooth as long as not too 
many entries are discarded. An important measure of how much X deviates 
from A is the relative Frobenius norm difference of inverses. This is shown 
in Figure |4j The difference of inverses is important because our goal is to 
approximate the action of the inverse and not the operator itself. 

Based on the discussion above, it is natural to question which p to choose. 
We argue that the precise value oip is not important as long as one can change 
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Figure 1: Input dense A E M^*^^^", specified in Equation (11), and output sparse X for 
p = 1 and q — 0.8. Plots generated using the espy program |15j where darker pixel values 
correspond to larger magnitude. 
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Figure 2: Singular values of A, specified in Equation ( 11 ), and X for p = 1 and q = 0.8 
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Condition number of pinv(A) * X 
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Figure 3: Effect of varying p and q on conditioning and sparsity. The input A is specified 



in Equation (11 1 
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Relative Frobenius norm difference of inverses 
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Figure 4: Effect of varying p and q on the relative Frobenius norm difference of inverses 

(||Xt-At||^/pt||^). 

q to achieve a given number of non-zeros. An evidence is shown in Figure [5] 
where we plot the data for various p and q values together. It shows that the 
conditioning and relative difference are highly correlated with the number 
of non- zeros rather than the exact p and q values. The "curves" for five p 
values lie almost on top of each other when q is varied in [0.6, 1]. 

We show that the computed X is such that the eigenvalues of A'^X, which 
are same as the eigenvalues of XA^ , are clustered around a value near 1 on 
the real axis. See Figure p[a) and (b) for eigenvalues of A, X, and A^X. 

The last observation we have, which will be useful in the next paper, is 
that the non-zero entries of X are highly correlated with the entries of A at 
the preserved locations. Figure [7] shows the sorted entries of vectorized A 
and the corresponding entries in X for p = 1 and q = 0.9. 
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Measuring sensitivity to p 




10 



so 



go 



go 



30 
25 
20 
15 
10 
05 
00 



400 



(a) cond{A^X) 



Measuring sensitivity to p 



^■^^ 



Mt W M rmtx . X 



800 1200 

Number of non-zeros 



1600 



.p=l 
■ p=1.5 
i p=2 
X p=2.5 

>: p=inf 



(b) ||Xt-At||^/||At||^ 

Figure 5: The figures show that the choice of p is not too important if q can be varied to 
achieve a specific amount of sparsity. Conditioning and relative difference of inverses is 
highly correlated with sparsity and not with parameters p and q individually. 
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5 
4 

4' 

♦ 



♦ A aX 



iV.ii 



-*--0- 



-3 -2t*\^il 
♦ ♦ 

♦-3 

4' 

-4 - 




4 ^ 



* A * 



Real part 

(a) Eigenvalues of A and X 



1.0 



0.5 

t 
re 

a 

re 0.0 

'Sb 
re 

E 
-0.5 



00 



-1.0 



Eigenvalues of pinv(A) * X 



♦ ♦ ♦ 



»+| — ♦ ^ ^ ♦ <» $ 

^^* X ♦♦L.(t 
♦ ♦ ♦ 



1.5 



2.0 



Real part 



(b) Eigenvalues of A^X 



Figure 6: Eigenvalues before and after sparsification. The clustered values in the second 



figure show that X would perform well for preconditioning A specified in Equation (111. 
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Correlation between input and output values 
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Figure 7: Correlation between A and X values after sorting vectorized A and pairing 
corresponding X entries with it. 
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